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ABSTRACT 
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A frequency list of words in the titles has been compiled in ordirto 
use a weighting procedure for sorting the printout. Ttrai^Sng 
^A^thor/^)'' with this new service is needed. 



COUNCIL OF EUROPE 



CONSEIL DE L' EUROPE 



fV^ 0« OE^ARTMEMT OF HEALTH. ^ > 

^ • EDUCATION ft WELFARE 

( vl OFFICE OF EDUCATION I 

e«Ae»/%wm^ w ^^'^ DOCUMENT HAS BEEN REFRO- 

STRASBOURG. 11th hfovember 1972 duceo exactly as receiveo from DECS/DOC (72) 15 

CD THE PERSON OR ORGANIZATION ORIG- 

INATING IT POINTS OF VIEW OR OPIN- //^ 

Q'lONS STATED 00 NOT NECESSARILY 
REPRESENT OFFIOAL OFFICE OF EDU- 
Jjj CATION POSITION OR POLICY 



COUNCIL FOR CULTURAL CO-OPERATION 



AD HOC COMMITTEE FOR EDUCATIONAL DOCUMENTATION AND INFORMATION 



EUDISED PROJECT 



THE USE OF ERIC TAPES IN SCANDINAVU, SEARCHING WITH THESAURUS 
TERMS IN NATURAL LANGUAGE 



BJom V. Tell, Kentin Wengrca Aod Winnie Hemborg 



Royal lutlnitB of Technology 
Stockholm 




Hid 



DECS/DOC (72) 15 



SUMMARY 



Since February 1971 the Royal Institute of Technology, Stockholm, has been running the ERIC data base 
mainly for SDI purpose. The implementation of the data base into the generalized search system, ABACUS, 
is described. 158 users receive SDI service at present. 99 from governmental and educational institutions. 
23 from industry, aio 36 from abroad (Finland 26. Norway 8, Switzerland I and the United Kingdom I) 
Retrospective searches have also been made. 

Two methods of matching users to documents have been employed for the ERIC data base - the rontiolled 
vocabulary of the Thesaurus of ERIC Descriptors, and the free text words of titles. Some users' assessments if 
the relevance of the output have been gathered, and examples are given of query formuUtion into profiles and 
the resulting printout of references. 

A number of the profiles for the ERIC dau base have also been run on data bases such as ISI. INSPEC and 
COMPENDEX. The practice of writing profiles which conoin term types which are appropriate to only one of 
several data bases, against which they are searched, is discussed. A frequency list of words in the titles has 
been compiled in order to use a weighting procedure for sorting the printout in an helpful order. A training 
programme for acquainting the user with this new service has been needed. However the present results show 
that a g.eat number of users have found it of interest to use die SD I service in dieir work. On the other hand 
many of the queries have also had to be searched on other data bases in order to assure a reasonable coverage. 

I. INTRODUCTION 

The ERIC data base is run by the Royal Institute of Technology Library. Usually, the Ubrary functions are 
those of acquisition, cataloguing, storage and circuUtion. How did it happen then that the Institute considered 
it within its scope to include machine-readable data bases such as ERIC, and provide an inforn.ation service 
based on tbem 7 Why should the Ubrary offer an information service which was not otherwise available;- Cou'd 
it justify the costs of acquiring and mainuining mechanized data base^ and the computer operations i This 
paper will try to answer these questions. 

The Swedish government has taken an active interest in developing a poUcy for economic growth ta 1967 
it Uunched a programme for the promotion of technological development and industrial growth, and a plan for 
the development of scientific and technical information was included. The government was especUlly interested 
in studying the viabiUty of mechanized information services in the field of science and technology and the 
uaUty they could offer to usen in research and industry. The Institute Ubrary was chosen as the agency 
responsible for the estabUshment of a mechanized service for users in science. Industry and education. 

The requirements for the computer operation of a service had been thoroughly studied during Tell's j-ears 
as department manager of the Swedish nuclear estabUshment. AB Atomcnergi. Then in 1967 the Instituu 
Ubrary received a first grant of 80,000 Sw.Cr. ($ 16,000) to initiate a computerized service in the field of 
mechanical engineering. Over the years the scope has extended and the grant has increased, and it has now 
stabiUsed around I MilUon Sw.Cr. outside the ordinary budget of the Ubrary. Half of that sum gci to thi 
salaries for documentaUsts who have been added to the Ubrary staff. Thus, the fundamennl requircmenti 
for staff and funds have been fulfilled by the new poUcy. 

2. THE ORGANISATION OF A NEW COMPUTHUZED SERVICE 

in order to start a computerized service the best choice, at least at the time, seemed to be a current 
awareness service - SDI - Selective Dissemination of Mormation. SDI is a system developed by the Uie 
Hans Peter Luhn at lEM in 1959 for alerting participants about new pubUcations such as jounul articles 
reports, conference papers etc. The acronym SDI has the special connoUtion - at the process makes Jse 
of a computer. This Is possible when the references to the Uierature are stored on magpetic upe 
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The Reformatting oi ERIC Report Resume Master Data 


into the ABACUS Format. 




• 

ERIC 




ABACUS 




Field name 


n leia Identi- 
fication no. 
in hexadecimal 


Searchable Printout 


Dele 


Sequence 


0000 




V 


Add Date 


0001 




V 


Change Date 


' 0002 




V 


Accession Number 


0010 


X 




CI car inghou s e 
Accession Number 


0011 


X 




*Other Accession No. 0012 




X 


^Program Area 


0014 




V 


♦Publication Date 


0017 


X 




Title 


OOIA 


X X 




Personal Author 


OOIB 


X X 




* Institution Code 


OOlC 




X 


♦Sponsoring Agency 
Code 


0020 




X 


Descriptor 


0023 


X 




Identifier 


0024 




X 


♦ EDRS Price 


0025 




X 


♦Descriptive Note 


0026 




X 


Issue 


002B 


X 




Abstract 


002C 




X 


* Report Number 


002D 


X 




* Contract Number 


002E 




X 


* Grant Number 


002F 




X 


* Bureau Number 


0030 




X 


* Availabmty 


0031 




X 


Journal Citation 


0032 


X X 




* Institution Name 


0080 


X 




Sponsoring Agency 
Name 


0084 




X 



* Not Used in CUE 
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3. IMPLEMENTATION OF THE ERIC DATA BASE INTO ABACUS 

TlK basic approach employed has been to use a general processing format into which a record of a 
particular output such as the ERIC files can be converted by a reformatting program so that its records 
can be searched. Thus, the search routine will be the same as for records of other systems simiUrly 
converted by individual refomutting programs. 

The success of tfau pragmatic approach to ibe compaubiUty problem o.' various upe formats greatly 
depends upon the hospiuUty of tiie search record format. The ABACUS was designed in 1966. before the 
MARC pilot program and die interchange format reflected in Intemational Standard ISO/DIS 2709 (Coward 2) 
which is foreseen as the standard for UNISIST and EUDISED. However, the ABACUS record has many 
cbwactttlitfcs in common with MARC and ISO. A directory to the whole record maps out die record lengdi. 
the dau elemenn present, and die number of characten in each element. Tlie directory is a fixed field 
biader followed by variable datt fields. The fixed fields give die address to. and die lengdi of die variable 
fields. The iienu of interest in die external dau base are selected, and fields in die ABACUS format are 
. iillocated by die reformatting program. Depending on die amount of information on die external upe. die 
identification process differs from one format to anodwr. 

The most extensive format in ERIC is die Report Resume Master Dau Set. many of which fields are not 
appUcable in die shorter format of Journal Article Master Dau Set. Not all fields are of interest to die 
Scandinavian users. Thus, at present, some fields are deleted when reformatting into die ABACUS Table I 
shows die ERIC Report ResumC Master Datt Set Fields and dieir treaonent in die ABACUS record Even if 
documenution is provided by a daU base producer, die reformatting specification is written aft^i inspection 
of upe dumps. 

to general, die reformatting of die ERIC upe formats was a radier straightforward job of 30 hours 
programming, even if diey devUted from die ISO interchange format into which, it is hoped, diey will 
evenoully change. ERIC files in dieir piesent form are grouped in variable lengdi blocks, die maximum 
lengdi of which it included in die Ubel. The first two bytes of each block specify die lengdi of die block 
Mmilarlyt die two bytes of each record specify die lengdi of die record. Widiin each record, die first two 
bjnet of each field specify its lengdi and die ifajid and fouidi are die fieU identification number 
EtaMtially, die alkication of fields in die ABACUS program depends on die field identification numbers 
widiin die two ERIC record types for reponi and articles. As can be seen fcom Table I. die 26 fieUs in die 
ERIC format yIeU 5 fieUs in die ABACUS set of searchable fields. The search terms can operate widiin 
diese, since diey are specified widi regard to die type of fieU in which diey are to be s;*rched . 

4. PROFILE CHARACTHUSTICS 

The construction and revising of query profiles is anodier essential usk in an SDI system which demands 
an efiort from die user and die subject specialist, when a user wants to submit a question to die SDI system 
he is requested to formuUte his fieU of interest in natmal language which means in a normal narrative way. 
describing his interest in some deuil. it has proved very useful for die user also to supply rome refcrence/to 
p»pers which he considers relevant to his query. He could also provide a list of significant terms and if 
ponible. make a draft of die acnul search profile. The suff have prepared a Profile Design Manual which 
expUins die principles of a computer-operated Information retrieval system and describes all deuiU of die 
profile oonslniction. 

Widiin die research and devetopment programme of die Swedish National Board of Education die 
introduction of die ERIC upes has alro resulted in a report by Bemfaard Bierschenk. Under die titie 
-To Search for Uterature by Means of Computers - (to Swedish), die audior eUborates upon die 
infwmation and communication problems in education. The construction of profiles is given a broad 
treatment, and much of it has been copied from die ftofile Desipi Manual. Tools of diis nature ht.p die 
user to visualize die usage he can make of die computerized service and aid him in devetoplnE 1 is 
individual profile. *^ * 
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GENEIUUZCD FLOV* CHART 1 OU VliOflLE CONST RUCTION 



The SDI iubscnb^r provides narrative 
descripUou of request ..j intonnatiou 



3- 



The dociimentalist checks hu's interpreta- 
tion of the subject 



Statement of specific search words. s> no- 
nyms, relatedsubject plirases, narrower 
terms, broader terms, author names* jour- 
nal titles, author affiliations 



Truncation of terms 



Constiuction of search loiic, which defi- 
nes subject by combining groups of 
search terms using logical operators 



Search terms and M»ch logic puncl)ed and 
read into computer 



^ No 




I Se.rch | 




Subscriber evaluates profile and refe- 
rences 



Ko- 



Subsciiber satisfied 



Yes 



References relei-ant to the sub- 
ject provided by tlie customer 



Tlie&iuri, clossarics. handbooks, 
indexes and titles in abstract 
journals 



Manual for profile construction 



Search terms and search logic coded for 
input 




Manual for profile input codes 


A 



5 



Q 



Profit? stored in magnetic tape 
profile store 



ISI Source Tape 
KTH Mech-Eng 
POST 
INSPEC 

Metals Abstracts index 

Current Index to Conference Papers in 

Engineering. Chen.^stry & Life Sciences 

COMPENDtX 

Nuclear Science Abstracts 

Abstract Bulletin of The institute 
of Paper Chemistry 

Food Science and Technology Abstracts 
ERiC Master Files 

Scientific and Technical Aerospace 
Reports 

international Aerospace Abstracts 
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The interactioo between the staff and the user is essential for a successful search. On ilie basis of the 
user s sutements the subject specialist specifies die question by making a list of sigpificant terms, either 
from the ERIC Thesaurus, or terms which might occur as potentUl woids in the titles of documents. Among 
the sttff there are subject specialists in education, psychology, business administration, etc. Furthermore, 
the list might a^ include auOors. affiliations, and journal titles. A$ die system permits search bodi on 
keywmls and on namral Unguage used in titles, die subject specialist uses diesauri. handbooks, dictionaries 
and all odier means he might find helpful and relevant for die fa.tnuUtion of die profile. He has to make a 
special point of checking die printed Tokimes of Qiirent index t» loumals in Education, and Research in 
Education, and odier appropriate sources to find die occunence of terms when used alone or in combination 
with otiier terms. A generalized flow chart. Figure 1, has been constructed by Zofia Gluchowicz(3). 

While die keywords must be written exacUy as diey appear in die Thesaurus and on die ape. die free 
text terms in potentUl titles can be truDcated bodi at die beginning and at die end. Truncation faciliutes 
retrieval of items ccmtaining vroid fragments which are common to different forms of a w«d, and words 
widiin words can be searched for. As will be seen horn die examples below, suffix (right-hand) trimcation 
occurs very often, while prefix (left-hand) tnmcation is more unusual. Combined suCfix and prefix trimcation 
is. on die odier band, more ccrnimon. Fbr example, die tmncated term /CASSCTT/. where die sUshes sund 
for truncations, will retrieve STBUOCASSBTTES, VIDEOCASSETTE. CASSETTE-P£<X)RDHl, CASSETTE/ 
CARTRIDGE, etc. 

As can be seen from Figure 2. the terms are numbered sequentially in the profile printout to facilitate 
updating. The terms arc also grouped togedier, and die groups are indicated by capital letters A. b. C. etc. 
Terms, or groups of terms, are Unked togedier in a k>gical marker by using "and", •'or", and "liot" togic. 
The number of terms in one profile might be up to die system-allowed 150 in ABACUS, to die rew VIRA 
program dicre are no such restrictions. On die odier hand, as charging policy is to count 30 terms as one 
profile, the average number of terms per profile varies around 24. 

The printout of die profile even includes a description in natural language of the query, the search logic, 
and die list of terms classified according to type of terms such as words, keywords, author names etc. The 
profile printout and every updating of it is sent to die user. For verification a copy of die profile as well as a 
copy of die search results are kept in die files of die service, transferred every 9 mondis inso microfilm 
cassettes. 

The user's responses to early selections based on die first profile approxinution to his fieU of interest 
are used for improving die profile. Thus die maintenance of die prc^le is carried out by adding new terms 
and deleting okl ones which do not give satisfactocy results, or by opening and tightening die logic. False 
co-ordinations between search temns from different term g^ups can also be ctetected and should be a^roided. 

While constructing die initul profile we try to choow die logical strategy considering die user's wishes, 
and accordingly decide on die degree of restrictivity for die initial computer run. Often we use a less 
restrictive logic, i.e. . not too many "and " or -hot" restrictions, in die initUl profile, even if it will result 
in an output of many irrelevant references, i.e.. noise, and dien after a few searches adjust die profile on 
the basis of die user's evaUiatioo of the ontpat. 

5. raOCESSlNG METHODS AND COSTS 

An inevittble characteristic of large retrieval systems is dut a strategy for searching a small or medium 
size data base might differ si^tificantly from a search strategy for a large base. During die years our search 
mediods have passed duough die mere masking-off technique. yieUing search times proportional to die number 
of references and terms in die profiles, into a more elaborate technique nuking use of hashcoding and tree 
structure searches, dius arriving at an abnost kigaridunic increase in time when die number of terms in die 
profiles grows. The newest program, having die acronym VRA and written by Rolf Larsson. is run in 
parallel widi ABACUS (Zennaki 4). The present profile program. PROSA. includes 2,500 statemeno in 
COBOL, and die VKA search program counts 2.000 statements in IBM assembler language. 
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5 2. 



Term 
No. 



Profile 70E 

Subject: Audiovisual aids for the mentally retarded 
Data bases: ERIC, ISI, INSPEC 
Logic: A & B 



Term 
Croup 



yxu 


A 




A 

A 




A 

A 


SJHXJ 


A 




A 
A 


uoo 


A 
A 


070 
UOO 


A 
A 

A 

A 


090 


A 
A 


xuo 


A 

a 


XXO 


A 
A 


1^0 


A 
A 


130 


A 

n 


XHU 


A 
A 


150 


A 
A 


xoo 


A 
A 




A 
A 


loO 


A 
A 


xyo 


A 
A 


^00 


A 


cXO 


A 
A 


c20 


A 
A 


2P0 


A 
A 


2*H) 


A 
A 


250 


A 


2o0 


A 




A 

A 


2^ 


A 
A 


290 


D 


poo 


Ti 
D 


^10 


T% 

D 


320 


B 


350 


B 


34o 


B 


>50 


B 


360 


B 


370 


B 


380 


B 


390 


B 


400 


B 


410 


B 


420 


B 


430 


B 


440 


B 


450 


B 


460 


B 


470 


B 



Search terns 

TAPE RECORD/ 

VIDEO TAPE RECORD/ 

EEUCATIONAL TELEVISION/ 

INSTRUCTIONAL TELEVISION/ 

AUDIOVISUAL/ 

CASSETT/ 

CARTOIDGE/ 

EVR 

vra 

VCR 
ETV 
ITV 
CTV 

SELECTAVISION/ 

TELEVISION 

TV 

/VIDEO/ 

CARTOIVISION/ 

8mm/ 

AUDIOVISUAL/ 

AV 

A-V 

VIDICORD/ 
VISUAL AID/ 
MEDIA/ 
PICTURE/ 
LONG-DISTANC/ 
AUDIO- VISUAL 

EDUCATIONALLY DISADVANTAO/ 

LOW ABILIT/ 

SLOW LEARNER/ 

MENTALLY HANDICAP/ 

EDUCABLE MENTALLY HANDICA/ 

RETARDED/ 

RETARDATION/ 

MENTAL RETARDATION/ 

EXCEPTIONAL/ 

SPECIAL/ 

RETARD/ 

LOW/ 

SLOW/ 

PAILUR/ 

DISADVANTAO/ 

B'iLOW/ 

EXCEPTION/ 

DROPOUT/ 



Weight 



Tcrir. 
Type 



2 


KEYl/ORD 


2 


KEYIVORD 


2 


KEYWORD 


2 


KEYl/ORD 


10 


KEYl/DRD 


2 


WORD 


2 


V/ORD 


2 


VJORD 


2 


WORD 


2 


WORD 


2 


\iGRD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


V/ORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


10 


WORD 


10 


WORD 


10 


WORD 


10 


WORD 


10 


WORD 


2 


WORD 


2 


WORD 


10 


WORD 


10 


WORD 


2 


KEYWORD 


2 


KEYl/ORD 


2 


KEYV/ORD 


10 


KEYWORD 


10 


KEYl'/ORD 


10 


KEYV/ORD 


10 


KEYV/ORD 


10 


KJSYWORD 


2 


KEYWORD 


2 


KEYWORD 


10 


ViTORD 


2 


VrfORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


10 


WORD 


2 


WORD 


2 


V/ORD 
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The mic files have been run with the ABACUS piogram for the SDI seivice. while for lecrospective 
marches ±e combinaifon of ABACUS aad ViRA has piomJ more economical. During 18 months.Tom 
February 1971 to the time of writing, ten SDI runs and one retiospective search, divided into duee batches 
luve teen performed. The search sutistic. for these runs are shown in Table, II and III. It is obrious that 
fte vmA program u more efficient when e.g. , Run 2 in Table 11 is compared with Batch B in Table III 
f ""T; °* VRA are less than one wi« of those of ABACUS taking accoun't 

of the present pricing strucoire per hour for the twoeomputers used. HM 360/30 and 360/75. ^ble m 
gives the computer time for tte retrospective search. A sequential search of a large dau oase is often 
regained as excesdvely time consuming widu>ut compression methods. However, die VRA program 
permitt such searches to be made ecorwmlcally . The search time for about 60. 000 ERIC recor^^usine 
42 profites conuining 1. 137 search «rms was less th«, 20 minute. CPU time and 40 minutes inpat/oumut 
«»^"'*°°.«?^P^<°"' tinic is added.«te costs of the dau processing amount to 3. 100 Sw. Q. 
lrT:^T t ***** ID the user was equivalent to the pdce level 

set by the BBopean Space Research Organization, ESRO, for searching their files. 300 Frs. the balance 
also covered part of the cost of constroctlng the iTOfiles. 

h. i.**r'" w * "^^ °' ^ perfoimance of the profiles on a "management by exception " • 

basis, two sutisncal loob have been developed. The critical values of the printout to a user are (I) ^ 
over-abundance of references, and (2) no printout, to order to reveal these extremes, every search resula 

""""^ °' ^ P^fi''- The fom, is desigL like t^ 

scale of &e ,p««lome,er .f many cars, the longer the row of "stan" tte more the reason^^t one. 
foot on 4e brake. Figure 3 dupUys part of the search sutistics for Run 7 of ERIC. The columns give 
6e numb« of references u> the fint digit, the second, etc. Thus, tbc fust profile has reuilted in 
J tL\ «»« second in 8 . 60 . 300 = 368 references. On the oti« hand, profile 

no 26R has given no output. Furthermore, at the bottom of the fomi an indication « given of which 
profits have received no hits, and those which have received more than 40 hits . 

These «=arch sutistics give an indication of wtere the exceptional cases are located among the 
^! Z'r r*."" " '^'^ " 8^"* °f ^ o-Ser to find 

Zll^^^Jl"^] " " ^""^ '^^'^ ^^"^ *"^« « «nn combinations 

have ca,«ed the printout, together with the fkequencie. of tbese terms. See Figure 4. in this case the first 
step wouU be to analyse the combination MEASWEMENT TECHNIQUES and MEASLRa«ENT INSTRUMENTS 
whica oca« 13 times, perhaps in onler to change the logic or to pUce these words in separate gro^if they 
have given rise to m«,y irrelevant references. THe second coUann in Figure 4 indlcatefdie wS we a« 
experimenting with which will te discussed later on. hmiww, mc weigno we are 

6. SEARCHING KEYWORDS AMD WORDS M TlTi-K 

tide, or abstracts. In the case of another data b«e. Science atation todex Source T.L the ISI unes 
wMch among 2^joumab incUries around 80 core Journal, inihe field of educatton p«choLgy 
key*««^ or subject Indicaton other ihanihe titles. Thus free text search u c Vonly wTy' 
rt^^leT le "° ^ "^^^ « wing • «t of skeleton key, to open up ,ny machine 



Tlie ERIC files make use of keywords chnsen from iheThe«uni, of ERIC Dewipton. Searching 
d*»e keyword, becomes an additioul me«i. for the subject specUiist or the usa to augment the^rch 

'"^^J'L'^f^ "^^"^ When a dau h«e conuins^ids CHlve 

recommended that they te used in combinatioo with words in natural langnage. 

data tir^M?*;^*^ ^iviionment the «me profile in natural language can easily te used on various 
dau tese. while the use of keywords is resiricted to each specific dau tese which hi to te uken in. 
kTISSI^^" fon""'*'^^ «hc profile. Many of our profiles on tte ERK upes are also processed on tte 

disregarding from which dau base tte responding references will stem. 
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rA3L? II. 

£/IC Search Statistics for SJI reb> 1971 - July 1972 v.'ii:.; \ 

Bun Data No, Brie IIo, No, Scarcl: Tine 

llo. Base Records Profiles Aiisv/cr.^ :}C0/30 CPU 

Llinutes 



1 


HIS 


5402 


38 


2356 


94 


2.5 


2 


CUE 


19427 


38 


3360 


332 


3.7 


3 


PI3 


2473 


65 


1514 


135 


2.3 


4 


CUE 


4036 


70 


2793 


218 


3.1 


5 


RIS 


3932 


72 


4699 


322 


4.5 


6 


CUB 


4263 


82 


3062 


214 


2.6 


7 


RIB 


5874 


87 


6095 


341 


3.9 


8 


CUB 


4204 


92 


3099 


257 


2.8 


9 


RIE 


2867 


132 


4497 


285 


2.2 


10 


CUE 


4144 


134 


5269 


510 


2.4 



3c- -rcli ?i 
pe.' Profi 
J'.inutes 



TABLE III. 

ERIC Searcfa Statistics for one Retrospeetlve Se;irch in Datehes v/ith VIEA 



Batch 
No. 


No. Eric 
records 


Bo. 

Profiles 


No. 

Answers 


Conversion 
Time 360/30 
CPU Llinutes 


Search 
Tiae 360/75 
llinutes 
I/O CPU 


Print 

?ine 

iJinutes 


A 


12,285 


42 


10,097 


72 


14 


7 


49 


B 


19,919 


42 


6,696 


60 


8 


4 


24 


C 


27,575 


42 


9,677 


103 


17 


3 


33 



59,779 42 



26,470 



235 



39 19 106 
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FIGUiU 3. SEAHCII STATISTICS FROM RUH 7 



oni 

02r.l 
02'Jl 
07FI 
Of.Fl 
lOAI 
IlEl 
12FI 
I3FI 
lAFl 
I5F1 
16R1 
I9C1 
26H. 
2&E1. 
31C1 
3141. 
32SI. 
3601. 
36F1. 
38EI. 
38F1. 
39F1. 
AOGl. 
41F1. 

45B1. 
45DI. 
45F1. 
45GI. 
51AI. 
51FI. 
52F1. 
54F1. 
5411. 
56E1. 
56F1 . 
5631. 
57fcl. 
5eEl. 
SnRl. 
50EI. 



, ««««« 

** ! 



.* 
• 

.* 



«4t 



l^oi'^^^J.^,^^^^^^^ ^'^"OL'- INGA tr;;ffar 

26R1 44G1 6241 70F1 80AI 8851 



FOLJAN'O: PROFILER ERHOLL .••'ER 
0131 02E1 lOAl llEi 12F1 
41F1 45DI 45F1 45G1 51AI 



X:J 40 TRSFFAR 
13FI 14F1 19CI 23CI 3601 
51FI 54FI 5411 56EI 5631 5PR1 
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riGURi: 4. FREQUENCIES OF COiyCIDENCES OF PROFILE 64G IX RUN 7 



I ut'.I"!^.^ OEHftVlUR* CLASSROOM 03SERVATI0N TECHNIQUE* 

i ll 'VlAn 1 e^^AVICR* CLASSROnK* CLASSROOM OSSERViTTC;. T 
vwI^^n'^S 3r-H.V/lOR* PERFORM* CLASSROGf-5* TEACH* CLASSRO 

\ i il^^n'nS * BEHAVIOR* PUPIL* CLASSROCr^. OBSERVATlOv H 
ll ^lA f.'n?. * BEHAVIOR* TEACH* CLASSROOM OBSERVAT ON 
I w * CLASSROOM OBSERVATION TECHNIQUE* 

1 ri<J-^o'SS * ?l-^^^o2?:f E^ASSROOM 03SERVATI0N TECHr.'IOUE* 

2 ^K^-bS'So * en '* ^^'^''* CLASSROOM OBSERVATION TECH 
2 VlA-ln'nn I nScEJ"*^^* CLASSROOM C3SERVATI0N TECH.viGUd* 
I llA'ln'?.?. ! "BSERVATIOM* CLASSROOM OBSERVATION TECH^:Ir^UE 
I ^ kI-bS'SS I r'iVrll If*'"* CLASSROOM OSSERVATIO'I fECHNKU 
5 ^ ^I'^n'nS ! 11^^^' BEHAVIOR* CLASSROOM OBSERVATION TEChN 
I ^"^"* ^LASSROCM OBSERVATION T'^CHrilCUE* 

1 V K^^^S'Sn ! I-?'"* METHHO* CLASSROOM COSERVATIo^J TECHNIQ 

1 ^ ^i:^?'nn 1 locco^^"^* CLASSROOM OBSERVATION TECH.NIOUC* 

1 VIKT=:32,00 * OBSERVATION* TEACH* METHOO* 

ViiVlVnn ! OBSERVATION* EDUCATION* EVALUATION TECH.mIGUE 

^ K^-5i'22 : ^r'mPuJIPr «^''*^I0«* MEASUREMENT TECHk^ CUE 

J <l-sS'oS * cSnJ^y^n^'"^^* '^^ASUREMENT TECHNIQUES * HEASUR 

1 V '<lin'JS ! ^p'JCATlON* TEACH* TECHNIQUE* MEASUREMENT TEC 

l\ « n^nS ! i-^STRUMENT* MEASUREMENT TECHNIQUES * KPASURE 

X KT-50'SS ! TECHNIQUES * MEASUREMENT INS^RUM 

uT^y"^S*«2 TE^CH* MEASUREMENT TECHNIQUES * MF ASUREMENT 

VUT=bO,00 * TECHNIQUE* MEASUREMENT TECHNIQUES'* MEASUrIm 

yl'j-ll'oa : C?1l^^'''* MEASUREMENT TECTOS JIJalSa 
».^t"H*oS CLASSROOf'* MEASUREMENT INSTRUMENTS * EVALUAT 

^ ! ^'^STRUMENT* MEASUREMENT INSTRUMENTS * cJaLUA 
«w^^^^nS INSTRUMENT* MEASUREMENT TECHNIQUES * EVALUAT 

W^J^Ip'^S I Sf^f"''^^*^•^^' INSTRUMENTS * EVALUATION TECh'iQ 

X J?:|?'nS I ^Tnnc?^"^*'^ TECHNIQUES * EVALUATION TECHNIoJ 

X^Ji^^S 1 ^IV?*^'^^* «^-^SURCKENT TECHNIQUES * EVALUATION 

^ * ^^'^CH* TECHNIQUE* MEASUREMENT INSTRUMENTS * 

J JJ^^^nS ! EVALUATION TECHNIQUES * CLASSROOM 09S:^RVATI0 

u c^°° * ^^'^CH* EVALUATION TECHNIQUES ♦ CLASCrSo.v cis 

y, JI"^!*®^ * ^^*CH* STUDENT* BEHAVIOR* EVALUATION TOCHNIO 

VlVl-lVtt : BEHAVnM^cl!Is?RoJy.;"^'« 

^ ^J^IS'm I nB«»«JJ n''* ^^*^"* STUDENT* CLASSROOM OBSER 
! OBSERVATION* ^MEASUREMENT TECHNIQUES * ^iPA5UR 
« ^J^fr'SS CLASSROOM* MEASUREMENT INSTRUMENTS ♦ -^VALUAT 

VIKT=B6.00 * OBSERVATION* CLASSROOM* TEACH* MtTKOoi CLASS 



10 



DECS/DOC (72) 15 



Especially for questions of inter-discipUnary nature it is obvious that tliey should be processed on several 
dau bases in order to assure good coverage. It is true, however, diat the reformuUUon ot a query into a 
profile for the SDI system ukes pUce in a kind of dUlogue with the computer, focusing on one data base at 
a time and considering both the terminology used in free text and the metalanguage of keywords or other 
subject indicators, to order to arrive at a standardization of the query formuUtion, altowing for different 
degrees of complexity of natural text and metalanguages, one method would be to develop a transUtion 
system between the various scientific disciplines reflected in the data bases by the generation of 
vocabularies and concordances for woids in natural language and the various thesauri used. 

We have started work in this area by the compilation of word frequency lists for various dau bases 
Thus, two years jf ERIC CUE 1969-70 tapes conta«Pfng ^'^ 575 references have been processed in order lo 
compile an alphabetic and a frequency ordere v. \ 'ised in the titles +. Out of thtt IIO.OOO word 

occurrences. 10, 642 different words were reco^ tuc non-informative words like: 

A OF TJE AND IN FOR TO ON AN 
AS WITH AT BY FROM OR SOME IS 

account for 24 per cent of all word occurrences. The folk>wing 20 woids account for another 10 per cent 
(frequencies are given in parentheses): 



EDUCATION (AL) ( 1900) 

SCHOOL (S) (1287) 

PROGRAM (S) (742) 

TEACHro(S) (708) 

STUDY(IES) (707) 

REPORT (671) 

STUDENT (S) (587) 



CHILDREN 

TEACHING 

LEARNING 

TRAINING 

COLLEGE 



(614) 
(493) 
(394) 
(392) 
(380) 



DEVELOPMENT (380) 
READING (373) 



RESEARCH 

SOCIAL 

LANGUAGE 

CURRICULUM 

EVALUATION 

FINAL 



(370) 
(356) 
(328) 
(304) 
(304) 
(303) 



It shouk! be noted that die first significant woid in die list 5s EDUCATION (AL) which has a frequency 
placing it between die two prepotitioiu FOR and TO (2053 and 1295). The information value of diese 20 words 
in die ERIC Thesaunis. in which aU occur except for die last word FINAL, couki be questioned. The woid 
EDUCATION (AL) is found in 7 cent * ' die document titles. 

That die use of die languAge (die scientific -jargon*) diffon between various disciplines was illustrated 
when compiling frequency lists for odier disciplines. Thus, for instance, die fint significant word in die 
INIS system • nuclear energy - was REACTOR, and die first in CAC - organic chemistry - ACID The 
non-informative woids mentioned above occur in almost die same oider in diese data bases as they do in 
ERIC. ^ 

Of the 116 search profiles in Run 9 on ERIC RIE. 65 were also searched on die ISI ttpes. and 34 of these 
latter also on INSPEC tapes. 8 profiles conuined only ERIC keywords. 67 botfi keywords and free text words 
and 39 were searched exclusively widi free text words. On die average die keyi/oid profiles contained 
20 keywords. The mixed profiles had II keywords and 18 words, or togedier 29 search terms. The free 
text profiles contained 39 words. In total 2. 529 words and 930 keywords were used in diis run. 



See Appendix for a brief note on die merged frequency list of words in die bibliographic references 
to '^ 9i7 ERIC reports and 27.573 journal articles. 



ERIC 
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S.IIC Search Profiles Cr.^anized into Broad CaterOi^i e s 



^ ^ . ^ „ , Ho. of Seni 

Subject Field Profiles 



1. Administration 3 

2. Comnunicationj Kethods and Character sties 12 
Counselling ^ 

4. Curriculum 2 



5. Education and Instruction; 

General Education Concepts 
Specific Types of Education 

Instructional Techniques, -nethods, -equipment 



6. Evaluation; 

Evaluation Techniques 
Tests and Keasurement 



11 



7. Health and Safety; Becreation 21 

8. Language and Speech ^ 

9. library Science 5 



10. Psychology; 

learning and Cognition 

Development 

Behavior 

Attitudes 

Adju5;tment 



11. Sociology; 

Environnent 
Socialization 
Social Relations 



Total 158 
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The 42 profiles which participated in the retrospective search confined 498 keywords and 639 words 
4profiles included only keywords. 36 had keywords «Ki words mixed, and 2 were formuUted in natural ' 
language words. The average number of keyword, in the keyword p«,fiie, were sUghtly higher than in SDI 
onX HIT ""^"^ ^ "^""^ ""^ '^»^8e iJofile, 32 terms ' 

How efficient the keywords of the ERIC Thesaurus are can be judged, in a way. from the examples 

SZ.^.^*!;- ''""^ °" °' P"^°"" '"^^^y " This confrono 

EiroiSED induectly with what Jean Viet(5) calls the fundamenul question of whether it is really necessary 
to have a tiiesaurus at the input end. . ' -c».es5«y 

The following remarks based upon our experience might illuminate this question. The combined 
«arch strategy we use. mixing keywords and words in free text, reveals that the present indexing habit 
to ER C of ustog keywords identical to words to the title, is futile, if some of the keyword, insttL took 
the place of broad subject categories like tho« we have used for subdividtog the user popuUtion in die 
followtog Chapter, it would add a new dimenrion to the search. This is. for instance £e case witT 
another daU base, die INSPEC. 

A snidy should ato be made about the proportion of titles dut «e not u«ful as content todicator, and. 
ftus. not ,uitabte for free text searching, ff only a small ,«pordon of tides are meaningleu. manual 
indexing ustog diesaurus keywords should be questioned. "-uu-i 

„..,f" ^f'"***"^ need, to be done. especUlly if we beUeve tiiat keyword indexing is 

necessary for d« quaUty of pcinied indexe, or for fitture on-Une rettieval system, of tite RECON typT dtle 

fnZIL"""^ K •"'8^*°' .llemadves to expensive manual 

. * '^•ight lead .udK« to improve d»e information content of d.erdtles. This has 

happened to area, where die KWIC indextog technique i, used. . u« 

wood^Sr f "^"^^ ""^ rfford it for our own datt base to mechanical engtoeertog. 

wood, paper «Kl pulp industry, covcrtog 250 journals (60. 000 references/yr) in three Unguage, OnVdtle 
augmenution is permitted to dte case of short dtle. (les, d»n 60 characti^ . We know L ^e* ie 
"^t^^^- ^ « i.e«nt we receive orden for ^ 

in these fields duu to education, a case to tovesdgaie. 
7. EVALUATION AND FEED-BACK 

23 frl^iT'l ""^.T'"^ SDl 'ervicc on ERIC. 99 from govemmenul and educational instimtions. 
23 from industry .and 36 from abroad (Ftoland 26. Norway 8. Switzerland I and die IMted Ktogdom I) 
A breakdown of die profile, into subject categories is found to Table IV. ~ng«om i) . 

• r ^'"H;'^ °^ "P"**^ on tapes to general, and 18 mondis on ERIC, we feel dut we are still 
ji«t ,cratehtog die surface of comp..«rized toformation retrieval. We ditok. for insmnce. dut die prtotout 
we now deUver a, answers to die queries 5hould go dirough furdna reftoement before reachtog die useT 

When we conrider die construction of a profile as reflecting a specific query, it is difficult to provide 
a measure of its effectiveness, especially a, our practice 1, to relieve reference from multiple filj. 
Question, about recall and precision lose interest. The e«ential measure which we can Zms 

Z ! "^""^ •* "P"""* * fto" highly relevant to irrelevant, or by counting 

die nuinber of documents he orders. Time and cost, of die computer are odier factor, which Ian be 
measured, and for ERIC we break even, more or less, between computer cott, plu, die cost, for die unes 

d\l'L""''"'''"°"l"'"''*'*°'""- e.g theconstru'ctiofof Se X^rb^ 

aefined as commoa library costs. 

n. J**.* °i 'r*"" "Pe^^ao" of « particular re,e«rcher in education i, 

illustrated by die followtog examples of »>me radier ,imple profiles. 
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Example 1 

ru feedback problem 1$ of interest to many in the educational field. The following profile has been 
constructed In order id cover the specific interest of a user attached to the university training centre at 
Lund in Swedeit 



ftofile 


63G 




Subject 


Feedback 




Datt basest ERIC, ISI 




Logic: 


A & B or C or D 




Term 
Group 
A 


Search Term 
FEEDBACK 


Term 
Type 
WC»U) 


B 


IMMEDIATE 




B 


DELAY 


M 


B 


PARTIAL 


W 


B 


TEACHING MACHINES 


It 


B 


ACHIEVEMENT 


n 


B 


PERFORMANCE 


M 


B 


PROGRAMMED 


ft 


B 


PROGRAMED 


tt 


C 


KNOWLEDGE OF RESULTS 




D 


FEEDBACK 


KEYWORD 



It should be noted that spelling varlanc have to be written as single words, a. HIOGRAMMED and 
PROGRAMED. The output from four searches on the ERIC apes has been evaluated by the user, if we use his 
requestt for photDOopiei as a feedback to the service. The first run resulted in 247 references. 105 of which 
were picked up only by the keyword FEEDBACK. 97 only by the free text words, and 45 by both methods Of 
more mierest is. perhaps, that the user ordered 7 of the documents retrieved by nanual language only, I 
retrieved by the keyword, anJ 8 which could have come out by either meAod. 

The resulti of the four runs were as foltows: 





Documents 


Documents 




retrieved 


ordered 


Retrieval method 


No. 


Ber cent 


No. 


Ber cent 


Keyword only 


153 




10 


24^ 


Words only 


127 


3S% 


15 


37% 


Keywords/Words 


83 


239> 


16 


39<^ 




363 




41 


100^ 



It is a duplication of effort to use thesaurus terms whidi already exist in titles, a phenomenon that we 
encounter here. The keyword FEEDBACK occurs also as title word in 23 per cent of the titles, which shows 
the futiUty of repeidng indexing terms which are identical with title words. It can be noted that one of 
the references ordered from the set retrieved by the keyword only couU have been reoieved by the free text 
search if a leftfaand truncation had been used before the word FEEDBACK. The title word was POSTFEEDBACK 



U 
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In this case over II per cent of the received references were of such interest that the user ordered 
photocopies. On the average of all dau bases the requests for photocopies as result of the SDI service 
are lower, around 7 per cent. 



Example 2 



to present-day society, with a trend to continuing education, a reappraisal of training methods and 
teaching procedures IS important. This raises the question of measurement techniques. The following 
profile has been constructed to meet the need of a researcher at the training centre at UppsaU Univerfity: 

Profile 64G 

Subject : Measurement techniques 
Data bases : miC, LSI 

Logic : A &(B or C or D) or D & E or G & (F or H) or I 



Term 
Group 

A 
A 
B 
B 
B 
B 
B 
B 
C 
D 
E 
P 
F 
G 
H 



Term 
Type 

WORD 



Search Term 

INSTRUCT/ 
TEACH/ 
PROGRAMMED/ 
PROCatAMED/ 
AID/ 
MEDIA/ 
INSTRUMENT/ 
EQUIPMENT/ 
ELECTRONIC/ 
TECHNOLOGY/ 
EDUCATION/ 
PUPIL/ 

CUSSROOM/ 
OBSERVATION/ 
TECHNIQUE 

PROOAMMED INSTRUCTION/ KEYWORD 
PROGRAMED MATERIALS/ 
DIACaiOSTIC TESTS/ 
ELECTRONICS/ 
EVALUATION TECHNIQUES/ 
CLASSROOM OBSERVATION TECHNIQUES 
MEASUREMENT TECHNIQUES/ 
MEASURB4ENT INSTRUMENTS/ 

The slashes stand for truncation so any flexion form after the slash will be accepted in the free text 
field, and any word after the slash in the keyword field if more words are used to define the concept the 
keywor^ stand or. This profile lists 23 term,. 15 of which are free text words and 8 keywords. This is 
one of the profile, in the reaospective search which in mill covered 59. 779 references. It resulted in 
558 references of which tiie user selected 55 by requesting photocopies, of these 26 were picked up by 
keywords alone. 19 by free text, and 10 by both method.. Most efficient were the foltowing search o^rmst 



DECS/DOC (72) 15 



konuktpertofi 

fdrctaG/rnstituUcn 



70 El 

postadfttt 



UNDERSTANDING THE LAW: A GUIDE FDR TEACHING THE MENTALLY RETARDED. 

BR-6-2883 A M 

AUG 69 \J 

VIKT-^00,00 ♦ AUDIOVISUAL* MENTALLY HANDICAPPED* rIJ^RO^^ 

HANDKA^JL"*^"'"' '^^^""^^ VOCABULARY TEST WITH THE EDUCATIDNALLY 

FITZGERALD, BERNARD J. AND DTHFR , 
JOURNAL OF SCHDOL PSYCHOLOGY; 8; <»; 296-299 

VIKT-100,00 » PICTURE* HANDICAP* EJD3266D 

I 

BEEDY, VERNDN AND OTHER 
JAN 71 » 

VIKT.80,00 . HEOIA* REIAROATION. HEICIAL REIAROAII On!"InJ 

!;^Su'5J^c^ii^";;'g;s:o^cs!;T'A??„^!;«^i?!"™" --"HeNTAT.cN o. 

70 * p n 

VIKT=2«,D0 ♦ AUDIOVISUAL* MEDIA* DISADVANTAGED* 

SIXTH lASLIC SEMINAR PAPERS. P\RT I: REFERENCE SERVICE-IN-Af Tinw daot 
II: PROCESSING * SERVICING OF SPECIAL HA?ER?i[s"ri!"Rj[RIEs! ' 
70 5 
VIKT«20,DD * AUDIOVISUAL* SfECIAL* M047750 

CHILDREN.^'' "^'^^^ EDUCATION OF Et'OTIONALLY HANDICAPPED 

70 * ^ « 6 

VIKT=20,DD * MEDIA* HANDICAP* E00A6158 

NATIONAL CENTER ON EDUCATIONAL MEDIA AND MATERIALS «:nR Thf 
handicapped: POLICIES AND PRCCEDURl's. '^^^ERIALS OR THE 

AUG 70 7 
VIKT=2D,DD * MEDIA* HANCICAP* £0044857 



10044 StackhAlm 08-230520 103 M KTHB Sloe* hotai 
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^'^^ Seirchtetm Documents Documents 

™™» retrieved ordered 



KEYWORD 


EVALUATION TECHNIQUES 


247 


7 


M 


MEASUREMENT TECHNIQUES 


145 


5 


M 


MEASUREMENT INSIHUMENTS 


88 


1 


W 


CLASSROOM OBSERVATION TECHNIQUES 


76 


14 


WORD 


TEACH/ (in various combinations) 


74 


13 




EDUCATION/ 


29 


1 


M 


INSTRUCT/ 


23 


9 


W 


TECHNOLOGY/ - - - 


2L 


5 






703 


55 



Wc have not tried to eUminate overUpping occurrences of keywords and words here. From the user's 
point of view his evaluation by doQuments ordered shows equal preference for the references selected by 
keywords and by free text. 



Example 3 



The next example treati tiie problem of audiovisual aids for die roeoully retarded. See Figure 2 
Profile 70 E in Chapter 4 above. This profile has specUl interest since die user has sent hack the graded 
evaluation form for die output of bodi ERIC and ISI. The evaluation was as folk>w« 



Tape 
service 


Number 
of runs 


Very 
Inteiesting 


Interesting 


Irrelevant 


Copy 
order 


ERIC 
IS! 


5 
12 


17 
1 


28 
5 


31 
25 


16 
1 




Tottls 


18 


33 


Co 


17 



ERIC is obviously die central dau base for queries of diis kind. However, die joumab covered by ISI 
cannot be completely neglected, to spite of die high noise level of KI. some relevant material has come 
out. Tormular 3 ' shows what die output k)oks like. The user has ordered photocopies of item no. I 
and 4 which he has circled. The noise level for ERIC is over 40 per cent in diis specUl case where HUC 
is the central datt base. 

Example 4 

Ubrary services bekmg primarily to die edocatiooal fleU. so we couki assume that ERIC wouU be an 
appropriate dau base. The query was about automation in Utrarles by datt processing. The profile 
(Figure 5) which contains 36 free text irords. has been run on five datt bases. The evaluation of 
482 references from the different datt bases foltowis 

Noise 
Level 



Datt base 


No. runs 


Very 
interesting 


toteresting 


frtelevant 


ERIC 


3 


30 


36 


196 


CICP 


7 


3 


11 


11 


COMPENDEX 


5 


16 


7 


7 


INSPEC 


5 


7 


13 


37 


ISI 


25 


9 


26 


74 




Tottl 


64 


93 


325 



75<5!) 
239> 
62<^ 
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Figure- ^ . 



Term 
No. 

^? 
06 



09 

10 
11 
12 

II 

15 
16 

17 
18 

19 
20 
21 
22 

U 

u 



29 
50 

51 

52 



55 
56 

57 
58 

59 

40 
41 



Pl*oflle 565 

Subject: Automation in libraries by data crocessln"- 
Data bases: CICP, COMPEriDEX, ERIC, INSPEC*. ISI 
Logics G & (F & (A or B) or A & B or C:k or D & E) 

Term 

Group Search terms 

A ARCHIVE/ 
BOOK/ 
DOCUMENT^ 
JOURNAL/ 
LIBRARy 

ED? 
EDP 

ADMINISTR/ 
AUTOMAT/ 

comput/ 

CONTROL/ 
DEVELOP/ 

GOVERNMENT POLICY/ 
ORGAN/ 
POLICY/ 
RESEARCH/ 
REIRIEV/ 
ROUTIN/ 
SEARCI^ 

stor/ 

SYSTEM/ 
TECHNIQUE/ 
TREND/ 
/PROCESS/ 

LIBRAE/ 
UNIVERSIT/ 

COURSE/ 
CURRICUL/ 
EDUCAT/ 
SCHOOL/ 
TRAINING/ 
UNIVERSIT/ 

DOCUMENTATION/ 
INFORMATION/ 
LIBRAR/ 

LEND/ 
LOAN/ 



A 
A 
A 

B 
B 
E 
B 
B 
B 
B 
B 
B 
B 
B 
B 
B 
B 
B 
B 
B 
B 
B 

C 
C 

D 
D 
D 
D 
D 
D 

E 
E 
E 

P 
P 





Tern 


Weight 


Type 


2 


WORD 


2 


WORD 


2 


V/ORD 


2 


WORD 


2 


WORD 


2 


V/ORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


V/ORD 


2 


V/ORD 


2 


WORD 


2 


V/ORD* 


2 


V/ORD 


2 


V/ORD 


2 


WORD 


2 


WORD 


2 


V/ORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD/ 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 


2 


WORD 
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Even if ERiC covers die major part of die very interesting references, the output from the other data 
bases cannot be neglected . The noise level is noubly high in ERIC« 

The delay time for die same reference appearing in die various services has been studied. We know 
that ISI is much faster dian COMPENDEX or INSPEC. and also dian ERIC. Thus for 51 identical references 
for diis profile, die ERIC data base was 8 mondis Uter as a median value dian ISI and 5 mondis later dian 
INSPEC, However. deUy time often has not even die sUghtest effect on die user. What happends instead 
is dut when he receives an early reference, he judges it as of low interest or irrelevant, while die same 
reference appearing 3-6 mondis laser is evaluated as interesting, and he orders a copy. In several cases 
it seems diat die continuous SDI service has a sort of subconscious learning effect on d« user, 

8. METHODS TO ESTABLISH A HELPFUL OUTPUT ORDER 

This paper is not intended as a primer on infomution retrieval for those interested in education, but the 
reader might already have noticed in Figures 2 and 4, and in die '-Formular 3" dut diere are indications 
of a weighting procedure ( VIKT - Weight). Wc should, dicrefore, like to mention diat we are experimenting 
widi various weighting roediods in order id establish a helpful anangement of die output so diat references 
early on die list will have higher piobabiUty of interest to die individual tier dian die Uter ones. The 
mediod shown in Figure 2 is based upon die assumption dut die words used in the profile and die words 
occurring in a reference are teUted in such a way that die more die words co-occur, die higher die 
probability that die reference is relevant to die query. 

This gives us one way of arranging die ouqput. Thus we note the number of co-occutrences and let 
die search togic operate aridunetically to anive at the v&hies upon which we base die order. As can be 
noted from die profile 70E in Figure 2. die weight, in general, is 2 assigned to all terms. However, die 
user has regarded some terms of greater importance and assigned die weight 10 to dicm. The diree words 
which pick up die first reference in die printout Tormulir 3" have all die weight of 10. two of which are 
in die same term group, dius. 10+10. The logical "and " is uansUted into multipUcation. so die complete 
expression wiU be: 10 x 2 (10) = 200. as die weight shows, li diis case it seems to have worked to die 
user's satisfaction, since he has ordered a copy by circling the reference. 

Ifeually, we do not encourage die user to ascribe subjective wei^ti. as we want to find out more 
about objectively assigned weights. This brings us back to die list of word frequencies dealt widi in 
Chapter 6 above. We mig^t. in particular, order die references on die basis of die frequencies of die 
words in die data base, which is our next step in preparation. The underlying reascming is as follows. 

When forming die togical expression in a keyword based system arranged as an inverted file, it is 
common to base the togical expression upon die number of documents indexed by each keyword. This 
number indicates die frequency widi which diis keyword has been used for indexing. Thus, on-line 
searches on a dispUy terminal usually end by forming die kigical expression dut gives die minimum 
ouqmt. This means dut high frequency tenns are looked upon as having less value than diose widi tow 
frequencies. 

In a free text search system in the batch processing mode, a search can also be based upon term 
frequencies using natural language if we buiki a frequency table from a large sample of references of 
each dau base, say around 30. 000 refeiences. The values for rank ordering couW dien be established 
as die sum of die vahies of die co-occurring terms, if diese aie expressed in l/n. where n is die frequency 
of die term given by die frequency uble (Tell 6). Such frequency tables are under construction for 
ERIC and die other dau bases. 

The weighting procedure is only die fint step. We are going to snidy parsing and coropuutional 
linguistic methods in order to find out die oonliibutioo such mediods can offer to output ordering. We hope 
to arrive at shorter Usti by introducing a cut-off when die weights are too tow. dius saving computer 
and user time. 
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9. PERSONNEL AND TRAINING 

Being responsible for expbring die utlUty of computerized infonnation services to scientific research, 
higher education and industry, we have felt that <Mie task hat been Id carry out research and development 
uf the kind which has been disck>sed above. The odier taski ate production, management, clerical support, 
and supporting Ubrary service. The overall staff picture for naming the SDI service is 10 full-time 
equivalents. The number of subject specialist is 5. clerical equivalents 4. and programmers I. When 
tiie ERIC data base was introduced diere was a pressing need for anotfier subject spedalist. 

An agreement was reached between our Ubrary and the Nadonal library for F^cbotogy and 
Education to make avaiUble a subject specialist on a 4/5 full-time basis; and at die same time diey 
pay for die tapes. 

This specialist, die co-audior Hemborg, has taken over die profile negotiation, coding, and retrieval 
analysis which had previously been die task of die odier co-audior Weagren, M die transitional stage 
tliat we arc i^ at present, operating widi two systems. ABACUS and VKA, die profile updating is 
laborious which has made it difficult, for example, to devote time to die construction of group profiles 
of interest in die educational area. Bodi SDI and retrospective searches are lailor-made for die 
individual and require personal attention of die subject specUlist. and become reUtively time-consuming, 
while group profiles are cheaper to update, diere being no necessi^ to adapt to individual requirements. * 

Also die Ubrary back-up service has been put under preuure since die introduction of die ERIC files. 
Even if requests for copies of die references put out by ERIC RIE are passed on to the National Ubrary for 
Psychotogy and Education where die microfiche collection is tocated. most references to Journal articles 
and technical reports outside die ERIC clearing-house collection are handled by our Ubrary from itt 
collections or by inter-Ubrary toans. In many cases photDCopies are ordered from die National Lending 
Ubrary at Boston Spa. United Kingdom. This follow-up service is foind to be important in order to 
keep die interest of die users. 

The SDI service has undertaken to train two subject speciaUsts from die National Ubrary. The training 
period has been stretched out over several mondis. fti May 1971 when die ERIC tapes were well run in. a 
lO-day s» minar on modem information retrieval mediods in die fieU of education and instructional 
technology was also arranged . 24 persons whose activity had ctose connection widi educational research 
and training particlpaw^d ; diey came from Sweden and die otiier Scandinavian countries. 

As a result of diis seminar several profiles were received. Recently, a number of profiles (26) have 
been received from FinUnd where die Uhiversity of ^rvaskyla has obtained a grant for experimenting 
widi computerized systems Uke ERIC, to diis case die Ubrary in ^yviskyla tries to construct and mainuin 
die profiles, and undertakes die individual maiUng of die search results dity receive from us. 

Our experience from trying to market die data bases in odier fieUs to scientists and people in industry 
has been diat die most effective mediod is one-day seminars where die afternoon sessicm is devoted to 
group work when every participant under die guidance of one of our staff constructs a profile in his field 
of interest. We dien promise to run it on a trial basis free of charge for a few mondB. Such a 
procedure of "taking die service to die user is, we diink, vaUd also in die educational fiekl in 
order to attract potential users. 

9. CONCLUSIONS 

The introduction of die ERIC tape service in our SDI service has been described in diis report, where 
from Felwuary 1971 we have built up a user popuUtion around 150 profiles, a promising result for a service 
new to die social science fieW. However, in order to assure good coverage a great number of die profiles 
are also run on odier services Uke ISI. COMPENDEX and INSPEC. 
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The efficiency of using keywords from the ERIC Thesaurus has been discussed. However, in order to 
utilize the other data bases the profiles have to Include natural language words. In the next few mondis 
the Bulletin Signal^tique wi!I be added to Ae files; in which case tfie profiles will then have to be 
translated into French in order to be searched on title words. In many cases the free text words perform 
equally well as keywords on the ERIC files. As we use a combined search strategy, we regard indexing 
with keywords identical to words in the title as futile. It would be more useful if some keywords cook 
the place of broad subject categories . 

The user population which now subscribes to the ERIC service might not be representative of the 
potential user population when the service has been run for some years, and the validity of the examples 
given might not be high. However, user reactions iii the form of requests for photocopies are high enough 
to be able to infer from the other services we run tiiat here is a good indicator of the user acceptabiUty 
of ERIC. 

The "noise level " - the proportion of irrelevant references - of ERIC seems so far to be higher than we 
are generally used to for a subject oriented file, but this may be because of the novelty of the service to 
which we have not yet become accustomed. The users, however, are toluant and accept it. We believe 
that more efforts must be made in order to imjwove the quaUty of the ouqmi by sifting out some Irrelevant 
references. A weighting procedure is only the first step which later on will, for example, involve parsing 
or computational linguistic methods. 

One essential feature required to keep the interest of die user is a fast backup service of documents 
or photocopies to the output from the SDI service. 

In summing up we would like to note: 

(1) that the ERIC topes are a useful addition but often not enough to answer the queries in the educational 
field, whereas in a multi-dato-base environment the user is assured of a good coverage. 

(2) the present system is timely, and its economic aspects are not such as to prevent us from doing 
occasional retrospective searches even on small |»ofile batches, 

(3) The quaUty of the output can with the present line of thinking eventually be brought under 
control, and 

(4) the present backup service with full documentation is necessary to gi« the user a reasonable 
degree of convenience. 
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APPENDIX 

FRFQUENCY LIST OF THE TERMS USED IN THE TITLES 
OF THE ERIC DATABASE 

By 

Bjom V. Tell 



The Deed for reformuUtion of die query when passing from one discipline to another is importtnt 
to an information retrieval system. This means that the system must be capable of performing juxtapo- 
sition of discipUnes and esUblish association links over the discipline barriers in reaction to the queries. 
However, account has to be taken of the use of the language (scientific "jargon in each discipline, if 
a linkage is to be developed between existtog discipUnes. so that muW-discipUnary probtems can be 
attacked. 



One way to solve diis is to devetop a communication and transUtioo system between die various 
scientific disciplines compristog die machine generation of vocabularies, concordances and diesauri. as 
an interim sage which might evuitually result in a common language for all sciences. 

As a first step a frequency list of die words used to die ERIC daU-base has been constructed The 
bibliographic references to 19.917 ERIC reports and 27. 573 journal articles have been merged, and to the 
47.490 titles we have found 454.466 word occutrencies. of which 19. 856 were found to be unique words, 
of which 551 were numbers. 10. 453 of die words occuned widi a frequency greater dian one. The list 
is presented alphabetically and by descendtog order of frequency. 

The construction of ^die list has been done on an lAd 360/75 ustog die VKA retrieval program 
supplemented by a special program for sorting, editing and pttotout. 



Seaetariat Note 



The word frequency list has been submitted to Mr. Jean Viet. Ruis. for hit use to compHins 
the Multiltogual EUDISED Thesaurus. 

It is toteresdng to observe diat aldiough diere are four times as many woid occunences tovolved 
to diis analysis as diere are to die COE-only analysis refuted to to Chapter 6. die list of "top twenty" 
significant terms accounting for 10 per cent of aU word occunences is hardly affected. The terms 
CURRICULUM. EVALUATION and SOCIAL are repUced by HIOJECHS). UNIVERSAL lES) - bodi 
of which ate tocluded to die ERIC Theuurus - and NEW. The frequency of occurrence of 
EDUCATION[Al) reaches 16 per cent of die document titles, which means dut it is used to some 
30 per cent of bibliographic references to EUC reports. 
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